60 research outputs found
Faster Query Answering in Probabilistic Databases using Read-Once Functions
A boolean expression is in read-once form if each of its variables appears
exactly once. When the variables denote independent events in a probability
space, the probability of the event denoted by the whole expression in
read-once form can be computed in polynomial time (whereas the general problem
for arbitrary expressions is #P-complete). Known approaches to checking
read-once property seem to require putting these expressions in disjunctive
normal form. In this paper, we tell a better story for a large subclass of
boolean event expressions: those that are generated by conjunctive queries
without self-joins and on tuple-independent probabilistic databases. We first
show that given a tuple-independent representation and the provenance graph of
an SPJ query plan without self-joins, we can, without using the DNF of a result
event expression, efficiently compute its co-occurrence graph. From this, the
read-once form can already, if it exists, be computed efficiently using
existing techniques. Our second and key contribution is a complete, efficient,
and simple to implement algorithm for computing the read-once forms (whenever
they exist) directly, using a new concept, that of co-table graph, which can be
significantly smaller than the co-occurrence graph.Comment: Accepted in ICDT 201
Model Counting of Query Expressions: Limitations of Propositional Methods
Query evaluation in tuple-independent probabilistic databases is the problem
of computing the probability of an answer to a query given independent
probabilities of the individual tuples in a database instance. There are two
main approaches to this problem: (1) in `grounded inference' one first obtains
the lineage for the query and database instance as a Boolean formula, then
performs weighted model counting on the lineage (i.e., computes the probability
of the lineage given probabilities of its independent Boolean variables); (2)
in methods known as `lifted inference' or `extensional query evaluation', one
exploits the high-level structure of the query as a first-order formula.
Although it is widely believed that lifted inference is strictly more powerful
than grounded inference on the lineage alone, no formal separation has
previously been shown for query evaluation. In this paper we show such a formal
separation for the first time.
We exhibit a class of queries for which model counting can be done in
polynomial time using extensional query evaluation, whereas the algorithms used
in state-of-the-art exact model counters on their lineages provably require
exponential time. Our lower bounds on the running times of these exact model
counters follow from new exponential size lower bounds on the kinds of d-DNNF
representations of the lineages that these model counters (either explicitly or
implicitly) produce. Though some of these queries have been studied before, no
non-trivial lower bounds on the sizes of these representations for these
queries were previously known.Comment: To appear in International Conference on Database Theory (ICDT) 201
- …